-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI builds for non-amd64 architectures #2401
Conversation
Codecov Report
@@ Coverage Diff @@
## python #2401 +/- ##
======================================
Coverage 73% 73%
======================================
Files 393 393
Lines 18477 18477
======================================
Hits 13533 13533
Misses 4944 4944
Continue to review full report at Codecov.
|
@RudolfWeeber, can you have a look at why the collision detection test fails on i386? @fweik, can you have a look why I needed this terrible hack to make particle deletion work on i386? I still need to fix up the s390x (big-endian) emulation problem. |
I think I finally reproduced the problem of hanging tests that @junghans reported for i386. There is an MPI deadlock which the npt testcase sometimes runs into. @fweik, any ideas? rank 0:
rank 1:
|
#2410 contains a solution for the remove particles issue, there was actually a bug. I didn't have any luck with the deadlock on Friday, but looking into it again now. |
I have fixed the s390x issue and @fweik's fix for removing particles works. |
My impression is that (mismatch 33.3333333333%) Ran 8 tests in 2.029s FAILED (failures=1) The box length is 1. |
Argh some of the collectives are also implemented in terms of isend and
suffer from the same bug. I'll look into that, but this may be not feasible
to fix.
…On Tue, Dec 18, 2018, 11:38 Michael Kuron ***@***.*** wrote:
i386 still has a deadlock during *interactions_non-bonded*:
rank 0:
#8 0xf70a2f61 in PMPI_Gather () from target:/usr/lib/i386-linux-gnu/libmpi.so.20
#9 0xf6e89d2c in mpi_who_has () at /builds/espressomd/espresso/src/core/particle_data.cpp:144
#10 0xf6e89fc4 in build_particle_node () at /builds/espressomd/espresso/src/core/particle_data.cpp:171
#11 0xf6e8a047 in get_particle_node (id=<optimized out>) at /builds/espressomd/espresso/src/core/particle_data.cpp:181
#12 0xf6e8b03c in get_particle_data (part=<optimized out>) at /builds/espressomd/espresso/src/core/particle_data.cpp:375
#13 0xe8179841 in __pyx_f_10espressomd_13particle_data_14ParticleHandle_update_particle_data (__pyx_v_self=0xd93273f8)
at /builds/espressomd/espresso/build/src/python/espressomd/particle_data.cpp:3550
#14 0xe8180946 in __pyx_pf_10espressomd_13particle_data_14ParticleHandle_3pos_2__get__ (__pyx_v_self=0xd93273f8)
at /builds/espressomd/espresso/build/src/python/espressomd/particle_data.cpp:4571
#15 __pyx_pw_10espressomd_13particle_data_14ParticleHandle_3pos_3__get__ (__pyx_v_self=0xd93273f8)
at /builds/espressomd/espresso/build/src/python/espressomd/particle_data.cpp:4550
#16 __pyx_getprop_10espressomd_13particle_data_14ParticleHandle_pos (o=0xd93273f8, x=0x0)
at /builds/espressomd/espresso/build/src/python/espressomd/particle_data.cpp:33957
rank 1:
#6 0xf7080348 in PMPI_Recv () from target:/usr/lib/i386-linux-gnu/libmpi.so.20
#7 0xf7a800de in boost::mpi::detail::packed_archive_recv(ompi_communicator_t*, int, int, boost::mpi::packed_iarchive&, ompi_status_public_t&) () from target:/usr/lib/i386-linux-gnu/libboost_mpi.so.1.65.1
#8 0xf7a73226 in void boost::mpi::broadcast<boost::mpi::packed_iarchive>(boost::mpi::communicator const&, boost::mpi::packed_iarchive&, int) () from target:/usr/lib/i386-linux-gnu/libboost_mpi.so.1.65.1
#9 0xf6dfec40 in boost::mpi::detail::broadcast_impl<IA_parameters> (values=0x57aab3d0, root=0, n=1, comm=...)
at /usr/include/boost/mpi/collectives/broadcast.hpp:118
#10 0xf6dfee52 in boost::mpi::broadcast<IA_parameters> (root=0, value=..., comm=...)
at /usr/include/boost/mpi/collectives/broadcast.hpp:128
#11 mpi_bcast_ia_params_slave (i=0, j=0) at /builds/espressomd/espresso/src/core/communication.cpp:1204
#12 0xf6e540eb in std::function<void (int, int)>::operator()(int, int) const (__args#1=<optimized out>,
__args#0=<optimized out>, this=<optimized out>) at /usr/include/c++/7/bits/std_function.h:706
#13 Communication::MpiCallbacks::loop (this=0x57a74820) at /builds/espressomd/espresso/src/core/MpiCallbacks.cpp:91
#14 0xf6dfc8b9 in mpi_loop () at /builds/espressomd/espresso/src/core/communication.cpp:2312
#15 0xf7ef4447 in init_init () at /builds/espressomd/espresso/build/src/python/espressomd/_init.cpp:1149
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2401 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA2i3xku4Hq3ZH8ra3bLOPZ5C0N7vCw1ks5u6MWmgaJpZM4ZCenN>
.
|
https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/pipelines/4521 @RudolfWeeber, the collision detection problem still exists. @fweik, i386 still has four deadlocks, two of which involve Boost: during interactions_non-bonded: rank 0:
rank 1:
during rigid_bond: rank 0:
rank 1:
during virtual_sites_tracers: rank 0:
rank 1:
during observable_cylindrical: rank 0:
rank 1:
|
I've merged the current master and #2431. We'll see how it goes. |
I can re-package for the next release and see what still fails. |
@fweik, the above MPI deadlock on i386 during virtual_sites_tracers still happens. So that one wasn't Boost's fault and still needs to be fixed. And there seems to be a new one in dpd. Rank 0:
rank 1:
And a new one in collision_detection. Rank 0:
Rank 1:
Also, the one reported for npt above has come back. |
I rechecked with a newer boost (and ubuntu), with cosmic and boost 1.67 on i386 only the collision detection test fails with |
|
I think the virtual_sites_tracers is genuine, that deadlocks in the lb halo comm. Unfortunately this code is not easy to read (for me). The other one looks like the issues we had' before. (One of the failure modes of the boost problem is that one communication is skipped on the sending side, which leads to a deadlock somewhere later). |
After upgrading to Boost 1.67, all segfaults are actually gone, so maybe we can still get this pull request into Espresso 4.0.1. However, the coordinate folding issue in collision detection is back, @RudolfWeeber:
|
Please merge pr2422 into this one. |
@RudolfWeeber, it still fails after merging the current master |
I don't think it's in the master, you have to merge the mentioned pr. Should be a change in grid.hpp |
No, it's there. Also, I meant python, not master. |
@mkuron I set this up for another project, but might be nice here to as you would get testing on x86_64, i386 and ppc64le and a couple of different distros. You can use https://src.fedoraproject.org/rpms/espresso/blob/master/f/espresso.spec and a starting point. |
Otherwise the collision detection test fails due to some unfolded coordinates
ready to merge |
fixes #2258
These tests will only run once a week and for tagged releases.
Should be tagged for Espresso 4.0.1 as it is part of the release checklist. We only need the .gitlab-ci.yml changes there though as I think everything else was caused by changes since the 4.0 release.